CFILT: Resource Conscious Approaches for All-Words Domain Specific WSD

نویسندگان

  • Anup Kulkarni
  • Mitesh M. Khapra
  • Saurabh Sohoney
  • Pushpak Bhattacharyya
چکیده

We describe two approaches for All-words Word Sense Disambiguation on a Specific Domain. The first approach is a knowledge based approach which extracts domain-specific largest connected components from the Wordnet graph by exploiting the semantic relations between all candidate synsets appearing in a domainspecific untagged corpus. Given a test word, disambiguation is performed by considering only those candidate synsets that belong to the top-k largest connected components. The second approach is a weakly supervised approach which relies on the “One Sense Per Domain” heuristic and uses a few hand labeled examples for the most frequently appearing words in the target domain. Once the most frequent words have been disambiguated they can provide strong clues for disambiguating other words in the sentence using an iterative disambiguation algorithm. Our weakly supervised system gave the best performance across all systems that participated in the task even when it used as few as 100 hand labeled examples from the target domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

In spite of decades of research on word sense disambiguation (WSD), all-words general purpose WSD has remained a distant goal. Many supervised WSD systems have been built, but the effort of creating the training corpus annotated sense marked corpora has always been a matter of concern. Therefore, attempts have been made to develop unsupervised and knowledge based techniques for WSD which do not...

متن کامل

HR-WSD: System Description for All-Words Word Sense Disambiguation on a Specific Domain at SemEval-2010

The document describes the knowledgebased Domain-WSD system using heuristic rules (knowledge-base). This HRWSD system delivered the best performance (55.9%) among all Chinese systems in SemEval-2010 Task 17: All-words WSD on a specific domain.

متن کامل

Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD

This paper explores the application of knowledgebased Word Sense Disambiguation systems to specific domains, based on our state-of-the-art graphbased WSD system that uses the information in WordNet. Evaluation was performed over a publicly available domain-specific dataset of 41 words related to Sports and Finance, comprising examples drawn from three corpora: one balanced corpus (BNC), and two...

متن کامل

Knowledge-Based WSD and Specific Domains: Performing Better than Generic Supervised WSD

This paper explores the application of knowledgebased Word Sense Disambiguation systems to specific domains, based on our state-of-the-art graphbased WSD system that uses the information in WordNet. Evaluation was performed over a publicly available domain-specific dataset of 41 words related to Sports and Finance, comprising examples drawn from three corpora: one balanced corpus (BNC), and two...

متن کامل

TreeMatch: A Fully Unsupervised WSD System Using Dependency Knowledge on a Specific Domain

Word sense disambiguation (WSD) is one of the main challenges in Computational Linguistics. TreeMatch is a WSD system originally developed using data from SemEval 2007 Task 7 (Coarse-grained English Allwords Task) that has been adapted for use in SemEval 2010 Task 17 (All-words Word Sense Disambiguation on a Specific Domain). The system is based on a fully unsupervised method using dependency k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010